Random Utility Theory-Based Stated Preference Elicitation Methods: Applications In Health Economics With Special Reference To Combining Sources of Preference Data

نویسنده

  • Jordan J. Louviere
چکیده

This paper reviews random utility theory (RUT) based stated preference elicitation methods, discusses types of elicitation procedures and data consistent with RUT and how one can use RUT as a basis for establishing a level playing field on which to compare and test alternative elicitation procedures with one another and with behaviour in real markets. RUT provides a sound, behavioural-theoretic basis for many forms of preference elicitation procedures, and also provides the theory by which various forms of data can be combined and preference elicitation procedures compared and tested. Although closely identified with the preference elicitation paradigm associated with the design and execution of discrete choice experiments, RUT also provides a theoretical basis for a variety of other forms of non-experimentally based preference elicitation, such as direct observations of choices in real or hypothetical markets, ranking of multiattribute choice options, and so forth. RUT provides a framework within which to formulate and test a wide variety of statistical preference models, the most familiar being families of probabilistic discrete choice models. The vast majority of stated preference research in health economics has been conducted in the RUT paradigm using discrete choice statistical models; hence we focus on these applications. Such a focus necessarily requires a brief review of principles from the design of statistical experiments because these principles underlie much work in discrete choice experiments. This review leads us to conclude that much previous work in health economics involving choice experiments has been simplistic, has frequently violated principles of good design practice and has lead to designs that have serious statistical flaws. Naturally, the latter raises serious questions about conclusions drawn from applications of these designs. In part, these deficiencies seem to stem from over-reliance on the stated preference literature in marketing and transport that emphasises prediction instead of understanding of underlying behavioural processes and has relied on design strategies associated with the study of single individuals, instead of samples and populations. Following this brief overview of theory and past practice, we compare elicitation methods and discuss combining sources of preference data to test external and cross-validity of methods and investigate differences in measurement reliability. That is, RUT provides an elegant framework within which to compare and test preference elicitation procedure on the same or different samples/populations, and we briefly discuss the theory and illustrate how it can be applied to these problems. Finally, we discuss a variety of serious, unresolved issues in the formulation and estimation of discrete choice models, and link these issues to empirical generalization and the reliability and validity of model choice results. That leads us to discuss the need to understand that actions taken by policy makers or experimenters can impact different moments of outcome, response and/or error distributions. We note that virtually all previous research has focused on the impacts of events, actions, variables, experiments, etc, on means of outcome or response distributions. Not only is the latter focus limiting, but we discuss why one should expect impacts on variances of distributions, including variances of error distributions. We close with a brief review of empirical research in this area as a lead-in to suggestions for further research, as well as ways to potentially resolve some of the serious unresolved issues noted in the discussion. 1.0 Introduction The purpose of this paper is to review Random Utility Theory (RUT) based preference elicitation procedures (PEPs) that have been or could be used in health economics. This review requires us to discuss the conceptual background and relevant results associated with combining sources of preference information. Additionally, in so far as is necessary, we also discuss issues related to modeling preferences in the RUT family of statistical models, which leads to discussion of interpretation and use of choice models in health economics. Interest in and use of stated preference (SP) elicitation methods in health economics is growing, as evidenced by discrete choice modeling workshops at Odense, Denmark (2002) and the University of Oxford (2003), as well as SP courses (Odense, 2002; Oxford 2003) and tutorials (iHEA 2003). Interest in SP theory and methods by economists is relatively recent, although there has been long-standing interest in Contingent Valuation theory and methods in environmental and resource economics (eg, Mitchell and Carson 1989). Interest in more general SP methods was encouraged by the Arrow-Solow Committee’s review (Arrow, et al, 1993) of Contingent Valuation, and since 1994, research on SP methods in environmental and resource economics and related areas (eg, agricultural economics) has steadily increased (see, eg, Adamowicz, Louviere and Williams 1994). Also, environmental and resource economists interested in SP theory and methods actively participated in the Odense and Oxford workshops and courses, allowing health economists to absorb lessons learned and advance the state of practice faster than otherwise might have been the case. The remainder of the paper is organized as follows. We first review random utility theory (RUT) based stated preference elicitation methods, and discuss the use of RUT to create a level playing field to compare and test alternative PEPs with one another and with real choices. Next we review and discuss serious, unresolved issues in choice models used to analyse RUT-consistent choice data, and then review evidence that the RUT assumptions that must be satisfied to obtained unbiased estimates are routinely violated. Finally, we briefly describe three ways to resolve the latter issues and move the field forward. 2.0 RUT-based PEPs RUT provides a sound, behavioural-theoretic basis for many forms of preference elicitation procedures and a way to compare and test them. RUT typically is identified with PEPs that are associated with the design of discrete choice experiments (eg, Louviere and Woodworth 1983). Yet, RUT also provides a sound theoretical basis for many other types of nonexperimental PEPs, such as direct observations of choices in real or hypothetical markets, ranking of multiattribute options in surveys, etc. Generally, PEPs that yield measures that satisfy order and equality also satisfy RUT, and hence many possible PEPs can satisfy RUT, such as discrete choices, “pick any” choices, rankings, ratings (if transformed to rankings), best and worst choices, and many more. Consider the basic axiom of RUT: Uin = Vin + εin, where Uin is the latent utility individual n associates with choice option i, Vin is a systematic, observable (explainable) component of utility and εin, is a random, unobservable (unexplainable) component. The systematic utility component typically is expressed as a linear in the parameters generalised regression specification, such as: Vin = ∑k βkiXki(n), where βki is a vector of preference parameters associated with the k-th attribute of the i-th option, and Xki(n) is a design matrix of K attributes associated with the i-th option, and the subscript (n) suggests that some subset of the attributes may be covariates that vary from individual to individual. Ben-Akiva and Morikawa (1990) show that RUT predicts that a vector of common estimated model coefficients for PEP must be proportional to a vector of common estimated coefficients for PEP. That is, βki = λ βki, where λ is a constant of proportionality = σε / σε. Thus, if the underlying preference processes are the same (ie, preference invariance holds), and one uses two different PEPs to elicit preferences, and if both PEPs satisfy RUT, the two vectors of preference/utility parameters must be proportional. For example, if one elicits preferences for health-related attributes from a discrete choice experiment (DCE) and a contingent rating or ranking task, the vectors of preference parameters estimated from both PEPs should be proportional if the underlying preference processes are the same. Similarly, if one observes choices among health options in real markets from the same or independent samples of individuals, and a common subset of attributes is observed/present in both data sources, preferences revealed by real choices and stated choices should be proportional if the underlying preference processes revealed by each PEP are the same and both PEPs satisfy RUT (Swait and Louviere 1993). Thus, one can pool revealed (RP) and stated preference (SP) data, or RP and RP data, or SP and SP data, or many other PEP combinations, such as multiple SP and multiple RP. Swait and Louviere (1993) show how to test the hypothesis that PEP parameter vectors are proportional; however, one typically would want to extend the grid search approach that they illustrated to a full information maximum likelihood (FIML) test to obtain confidence intervals to test whether estimated scale parameters are equal to and/or less than (greater than) some hypothesized value, such as one. Specifically, Swait and Louviere proposed testing the hypothesis of preference invariance for RP and SP data by stacking both data sources and estimating a model that restricts the common attribute parameters to be equal, with a single variance-scale ratio parameter that is the proportionality constant. One then compares the restricted model LogLikelihood (LL) to the sum of the LL’s from separate models estimated, respectively, from RP and SP data sources. These separate models have common K parameters; hence there are 2K common parameters; with the restricted model having K+1 parameters (the additional parameter is the variance-scale ratio) Hence, the degrees of freedom (df) for the test are 2K(K+1) = K-1. Louviere, Hensher and Swait (2002) review many empirical comparisons of RP and SP data sources, and report that the hypothesis of preference invariance seems to hold to a close first approximation in the majority of cases. McFadden (2001) also notes that there is compelling evidence of preference regularity for RP and SP data. 2.1 General RUT results for PEPs The Luce and Suppes Ranking Theorem (Handbook of Mathematical Psychology 1965) provides a key RUT result, namely that preference order and equality information can be used to impute implied choices. For example, if the preference order for goods A, B, C and D is 1,2,3,4 (1 most preferred), one knows that A>B, A>C, A>D, B>C, B>D, C>D, A>B,C, A>B,D, A>C,D, B>C,D and A>B,C,D. As well, if one knows that A and C are preferred to B and D, one then knows that A>B, A>D, C>B, C>D, A>B,D & C>B,D. Thus, one can “expand” or “explode” ranking data to obtain additional preference information to estimate RUT-based choice models. Having said that, it is worth noting that the outlook for pure ranking data is suspect because empirical tests consistently have shown that ranking data do not satisfy RUT. That is, preference parameters estimated from different ranking depths have been shown not to be equal after taking scale differences into account (See reviews in Carson, et al 1993 and Louviere, et al 1999). Moreover, other PEP response modes seem to be more promising empirically; for example, best-worst choices, discrete choices, etc. 2.2 More general RUT models To date most applications of RUT have dealt with discrete choices, but RUT also underlies other responses/outcomes, such as choice of travel mode to work plus departure times (or vice-versa); choice of GP plus visitation frequency; or more generally, more complex cases such as one becomes aware of a new OTC hay fever drug, gets interested in its benefits (eg, non-drowsy, lasts 24 hours), considers costs and usage rates, decides how to trade off benefits/costs, then decides whether to try the drug now, wait for more information before trying or never try. Such conditional, sequential decisions can be formulated as RUT models. For example, labour economists and marketers (inter alia) have formulated and estimated discrete/continuous or discrete/ordered models (eg, DeSarbo and Choi 1999; Bucklin and Sismeiro 2003) where one observes quantities purchased only if there is a purchase, or one observes categories of contract length only if one signs a contract. 2.3 PEPs consistent with RUT We focus on discrete choice responses because they are widely used in health economics and other fields, but we note that category ratings, “pick any” choices and “bestworst” choices also can be consistent with RUT (in principle, if not in empirical fact). Let us begin by considering the case of four Australian health insurance providers, such as MBF, HCF, NIB, NRMA. Suppose we ask a respondent to rate each insurer on a 1-7 scale expressing degree of preference, and we observe the ratings shown in parentheses: MBF(3), HCF(6), NIB(5) and NRMA(4). Ignoring any cardinal (interval) information in the ratings (a controversial subject, to say the least!), and focusing only on the order information implied by the ratings, we apparently can say that HCF > NIB > NRMA > MBF. However, we also can say that HCF>(NIB,NRMA,MBF); HCF>(NIB,NRMA); HCF>(NIB,MBF); HCF>(NRMA,MBF); HCF>NIB; HCF>NRMA; HCF>MBF and NIB>(NRMA,MBF); NIB>NRMA; NIB>MBF & NRMA>MBF. Thus, the respondent’s rating responses can be expanded to obtain more detailed and comprehensive preference information. Now let us consider the same four insurers, and ask a respondent to express preferences using a “pick any” choices response format. For example, suppose we ask the respondent to tell us whether she would deal with (yes) or not deal with (no) each insurer, and her responses are as shown in parentheses: MBF(no), HCF(yes), NIB(yes) and NRMA(no). Now, we apparently can infer that HCF>(MBF,NRMA); HCF>MBF; HCF>NRMA & NIB>(MBF,NRMA); NIB>MBF and NIB>NRMA. Thus, “pick-any” data also can be expanded to obtain additional, more detailed information on preference relations among objects. Now, let us ask a respondent to express preferences for the four health insurers using a “Best-Worst” response format. For example, suppose we ask her to tell us which insurer she prefers most (HCF) and which insurer she prefers least (MBF). Based on these two answers we can infer that HCF>(NIB,NRMA,MBF); HCF>(NIB,NRMA); HCF>(NIB,MBF); HCF>(NRMA,MBF); HCF>NIB; HCF>NRMA; HCF>MBF and NIB>MBF & NRMA>MBF. Thus, “Best-Worst” data also can be expanded to provide additional, more detailed preference information. Finally, let us consider asking our respondent to express preferences for the health insurers using a discrete choice response format. For example, we ask her to tell us which ONE insurer she prefers most and/or we ask her what insurance cover she has (if any), and she says “HCF”. Now, we can infer that HCF>(NIB,NRMA,MBF), and we also can see that discrete choice preference responses provide minimal information about order relations, despite their popularity. Thus, it behooves us to ask whether it makes sense to make use of more informative PEPs and/or ask additional preference elicitation questions. In turn, this begs the question as to whether “more informative” PEPs are consistent with RUT. Here the results are decidedly mixed and/or there is less here than meets the eye. Few empirical tests have been conducted that address the foregoing questions, but we can say that it is well-known that ratings rarely satisfy the equal-interval scale difference assumption required to yield cardinal measures. Thus, research is needed to determine whether using category rating responses to infer order relations is consistent with RUT. As previously noted, a number of tests have been made of ranking responses, and these tests consistently show that rankings are inconsistent with RUT (Carson, et al 1993). There have been no tests of Best-Worst (BW) elicitation, but as BW is inherently a simple extension of discrete choices, it is likely that this format is consistent with RUT, but we need to establish if this is empirically supported. It also is worth noting that the additional observations obtained from some PEPs may not be independent, and research is needed into this issue. 3.0 Measuring Attributes A stylized conceptual framework for the process by which actions taken by managers or policy makers ultimately impact choices in real markets is in Figure 1 (See also Hensher, Louviere and Swait 1999). This hierarchical and sequential process recognises that there are several intermediate stages that imply intermediate measures. Figure 1: How Managerial Actions Map Into Observed Marketplace Choices Xi A matrix of variables measuring actions taken by managers/politicians (strategic planning) Ski = f1k(Xi) A mapping of actions into perceived attribute positions of option (s) (psychophysics) uki(Ski) = f2k(Ski) A mapping of option attribute positions into the utilities of these positions (utility formation) Ui = f3[uki(Ski)] A mapping of attribute position utilities into utility values of holistic options (utility function) P(Choosing) = f4(Ui) A mapping of holistic option utilities into decisions to choose now, wait or never choose (choice process) P(i|choose) = f5(Ui) A mapping of holistic option utilities into option choices, given a decision to choose now (aggregate demand at time t) The conceptual framework suggests that one can “mix and match” measures, which can help to reduce specification error and enhance estimation efficiency by including measures in models that otherwise would be “omitted”. Thus, one can combine data from various sources that provide measures of these quantities for model estimation purposes, which can assist with identification, particularly in revealed preference data sources. In contrast to this view of the way that actions taken by managers/producers/policy makers in markets ultimately impact market choices, economists have traditionally measured elements of X for each option, and associated these elements with choices of options, the final stage in the process. Economic theory does not suggest that one should do that; however, if one can do this directly, there are advantages in estimating what is in effect a composite function. Unfortunately, however, in real markets and data sources, the elements of interest (ie, xki∈{X}), typically are very skewed, exhibit little variation, are correlated and/or pose serious identification problems (eg, are constant or unobserved). Thus, one often can substitute Ski, which requires one to measure Ski and/or combine data sources. Ski quantities are commonly measured in marketing; for example, perceived service quality (sq), perceived locational convenience (lc) and perceived array of options or selection (ao). There are various ways to measure these quantities, but one useful way is the Best-Worst Scaling (BWS) method proposed by Finn and Louviere (1992). Let there be J objects (eg, brands) to be measured. BWS places the J objects into choice sets using fraction of 2 factorial designs. A 2 factorial design maps into such design problems because the set of all possible choice sets of J objects is given by a 2 factorial. That is, objects are in/out of sets, and all possible combinations of in/out are given by the 2 factorial (Louviere and Woodworth 1983). BWS asks respondents to select two objects that are, respectively, the best/highest/largest and worst/lowest/smallest objects in each set. Finn and Louviere show that simple frequency counts give a subjective measurement scale for the objects. Figure 2 illustrates BWS measures for service quality (sq), locational convenience (lc) and array of options from which to choose (ao). Figure 2: A Simple BW Score Matrix for Four Health Insurers Attribute NIB HCF MBF NRMA SQ 8 4 1 2 LC 4 8 2 1 AO 2 4 1 8 One can use the BWS measures in choice models like traditional elements of {X}, as shown in Figure 3 for the case of one choice set containing four insurers, where NIB is the chosen option (ie, choice data from one choice set for one respondent). Figure 3: Coding Subjective Measures For Choice Modelling Choice Set Choice option Choice indicator Service quality Locational convenience Array of options 1 1 (NIB) 1 8 4 2 1 2 (HCF) 0 4 8 4 1 3 (MBF) 0 1 2 1 1 4 (NRMA) 0 2 1 8 BWS scores arise from the process of choosing best and worst options in all 2 possible sets of choice sets. That is, if an individual has a preference ranking over the J, and she chooses best and worst objects consistent with her ranking of them in each set, she should choose as follows for ranks 1 to 4 (number of choices in parentheses: 1(8), 2(4), 3(2) and 4(1). That is, the most and least preferred total choices will be as in Figure 4. Figure 4: Expected Choice Totals from a 2 Task for a Rank-Consistent Subject Thus, choice frequency(most) x choice frequency(least) = k (8) and f(B) = k/f(W). Insurers Best totals Worst totals 1 (NIB) 8 1 2 (HCF) 4 2 3 (MBF) 2 4 4 (NRMA) 1 8 Thus, the responses to Best and Worst choice set questions provides a ranking, which in turn provides choice frequency totals that one can use to measure Ski. Of course, there are other ways to measure variables in the conceptual framework. For example, one can measure attributes with multiple choices or similar questions (Louviere, Hensher and Swait 2002). For example, let an attribute of a GP be the source of his/her medical degree, and let the levels be as follows: a) established, well-known medical school in same country, b) new medical school in same country, c) established, well-known medical school in different country, d) new medical school if different country. A potential multiple choice question is illustrated below to measure the levels for a person’s current (or last) GP: Which ONE of the following best describes where your current GP obtained their medical degree (tick one)? established, well-known medical school in same country new medical school in same country established, well-known medical school in different country new medical school if different country. Questions like the above often are used to measure attributes of choice options when the {X} measures are unavailable. For example, status quo options often vary from individual-toindividual, with exact measures being unobserved and/or are “perceived” by individuals. Thus, one can obtain useful measures of independent variables in several ways, which allows one to mix and match measures by combining various sources of such information. 4.0 Discrete Choice PEPs Louviere and Woodworth (1983) apparently were the first to see the isomorphism between RUT and statistical design theory for discrete choices. That is, they recognised that choice model data sets are simply large, sparse, (typically) poorly conditioned contingency (crosstab) tables. In particular, the discrete multivariate statistical literature provides statistical theory and models for analysing contingency table data (eg, Bishop, Feinberg and Holland 1975), which map into forms of statistical choice models. Contingency tables are nonexperimental analogs of factorial designs. Louviere and Woodworth (1983) recognized the isomorphism, and proposed that one could design choice experiments that are systematically incomplete contingency tables. Once one views the design problem in this way, it is easy to see that one also can simulate a wide range of choice situations that occur in real markets as well as insure satisfaction of key statistical properties of choice models. Their work led to a large and growing literature in the design and analysis of discrete choice experiments recently summarized by Louviere, Street and Burgess (2003). Discrete choice experiments (DCEs) can mimic the properties of real markets to an arbitrary degree of accuracy limited only by time and resources. In fact, almost all real markets can be simulated by choice experiments, including designing and executing experiments in real markets to the extent that resources are available to do so. Two key DCE statistical properties studied in the literature are identification and efficiency. Identification refers to the range of specifications of utility functions that can be estimated from choice data conditional on particular forms of choice models. Choice experiments can be designed to insure that identification is satisfied (Louviere, Hensher and Swait 2002), but DCEs also involve human beings, which suggests certain non-statistical properties of choice experiments that must be satisfied like complexity and realism. That is, DCE tasks should be no more complex than tasks humans face in real markets, and the closer experiment and task are to what humans do, the more face validity. 4.1 The general DCE problem Let there be Xk attributes, and let each have Lk levels (we can relax this to let each Xk have different numbers of levels). All goods that can be described by the Xk attributes are given by the factorial of their levels; that is, ∏k (Lk) = T, where T = number of combinations of attribute levels. All possible choice sets of the T goods are given by the 2 factorial (2 = in/out of a set) = C, where C is the number of total possible choice sets. “Designs” represent systematic ways to sample from C, such that certain statistical properties are optimised (eg, statistical efficiency and identification of model effects). There are many ways to optimise designs, but a useful and widely used criterion is D-optimality, which refers to maximising the determinant of the information matrix (see, e.g., Louviere, Street and Burgess 2003). 4.2 Object-based DCEs We begin discussing DCE’s by considering the relatively straightforward problem of designing a choice experiment to vary only “objects.” For example, let NIB, HCF, MBF, NRMA, MU, AXA be “brands” of insurance (objects), and let each “brand” have 2 levels (in/out of a choice set), resulting in 2 possible choice sets. One can sample from the 2 by using a fraction to create sets. Each choice set represents a unique set of brand names, as shown in Figure 5a, where an orthogonal main effects plan (OMEP) was used to make eight choice sets. We also added the option of choosing “none” to each set for realism. Figure 5a: Using Orthogonal Main Effects Plans (OMEP) To Make Choice Sets Set NIB MBF NRMA MU AXA None Choices 1 P P P P P P NIB 2 P P A P A P NIB 3 P A P A A P NIB 4 P A A A P P NIB 5 A P P A P P AXA 6 A P A A A P MBF 7 A A P P A P MU 8 A A A P P P AXA P=present in, A=absent from each set Suppose one also has BWS attribute scores (data) from some source, as in Figure 5b: Table 5b: Best-Worst Attribute Scores (as per previous examples) Attribute NIB MBF NRMA MU AXA SQ 16 4 8 2 1 LC 4 8 2 1 16 AO 2 4 1 16 8 One can combine the BWS scores with choice data from a choice experiment like that shown in Figure 5a plus data on policy costs as illustrated in Table 5c (policy costs are hypothetical, and are “invented” for illustrative purposes). Table 5c: Coding Choice Data in Table 5a + Scores in Table 5c + Costs Set Alt Choice NIB MBF NRMA MU AXA SQ LC AO Cost 1 1 1 1 0 0 0 0 16 4 2 $700 1 2 0 0 1 0 0 0 4 8 4 $650 1 3 0 0 0 1 0 0 8 2 1 $625 1 4 0 0 0 0 1 0 2 1 16 $675 1 5 0 0 0 0 0 1 1 16 8 $725 1 6 0 0 0 0 0 0 0 0 0 0 2 1 1 1 0 0 0 0 16 4 2 $700 2 2 0 0 1 0 0 0 4 8 4 $650 2 4 0 0 0 0 1 0 2 1 16 $675 2 6 0 0 0 0 0 0 0 0 0 0 3 1 1 1 0 0 0 0 16 4 2 $700 3 3 0 0 0 1 0 0 8 2 1 $625 3 4 0 0 0 0 1 0 2 1 16 $675 3 6 0 0 0 0 0 0 0 0 0 0 4 1 1 1 0 0 0 0 16 4 2 $700 4 5 0 0 0 0 0 1 1 16 8 $725 4 6 0 0 0 0 0 0 0 0 0 0 5 2 0 0 1 0 0 0 4 8 4 $650 5 3 0 0 0 1 0 0 8 2 1 $625 5 5 1 0 0 0 0 1 1 16 8 $725 5 6 0 0 0 0 0 0 0 0 0 0 6 2 1 0 1 0 0 0 4 8 4 $650 6 6 0 0 0 0 0 0 0 0 0 0 7 3 0 0 0 1 0 0 8 2 1 $625 7 4 1 0 0 0 1 0 2 1 16 $675 7 6 0 0 0 0 0 0 0 0 0 0 8 4 0 0 0 0 1 0 2 1 16 $675 8 5 1 0 0 0 0 1 1 16 8 $725 8 6 0 0 0 0 0 0 0 0 0 0 One also can recode the data in Table 5c by mean-centering each BWS score and cost about their respective means to insure that all estimated brand constants equal their mean choice propensities, and insure that the utility of the choice of none equals zero. Any valid variable measures can be used, including real market measures of SQ, LC, AO and costs. 4.3 Attribute-based DCEs Attributes can be categorical or continuous, but most DCEs discretise continuous attributes like costs so that they vary over several discrete levels. Thus, DCEs typically use discrete levels for all attributes. Choice experiments can be designed sequentially or simultaneously. For sequential designs, one first creates T attribute combinations using one design, and assigns the T combinations (attribute descriptions/bundles) to choices sets with a second design (hence, called “sequential design”). Typically, the first design is used to insure that one/more utility specifications (eg, main effects only/strictly additive) can be estimated (identification), while the second design is used to insure that one/more choice process models (eg, MNL) can be estimated. The efficiency of the estimates is a function of both designs, and bias depends on satisfaction of assumptions that underlie choice of both designs. Simultaneous designs involve one design that simultaneously creates attribute combinations and assigns them to sets (Louviere and Woodworth 1983; Louviere, Hensher and Swait 2002). Many SP researchers in health economics designed DCEs in the following way: make T treatment combinations/attribute descriptions, and use one of the T as a reference to make T-1 pairs to be evaluated by a sample of subjects. Others used randomly assigned pairs of T’s, or similar “conjoint” methods to generate designs, display treatments and elicit ratings, rankings or choices. This approach to designing T treatments (using one as a reference to make T-1 pairs) typically leads to VERY statistically inefficient designs relative to optimally efficient designs (ie, 40% or less). The resulting designs also are not orthogonal in absolute attribute levels or attribute level differences (or contrasts in the case of qualitative attributes), at least some higher order effects may not be identified, and they rarely make sense in most applications. Additionally, many designs involving choices between designed treatment combinations and a status quo option are very inefficient, and degrees of statistical efficiency depend on how skewed the attribute levels of the status quo option are with respect to the overall space spanned by the attribute levels. Discussions and derivations of optimally efficient designs for choice experiments can be found in Burgess and Street (2001), Street and Burgess (2003) and Louviere, Street and Burgess (2003). Another popular approach to designing choice experiments is to use some type of random assignment, like randomly assigning T total treatment combinations/attribute descriptions (say t ∈ T), to pairs, triples, etc. In principle, random assignment should work as the central limit theorem requires that as the ratio t/T approaches one (t a sample from T), the statistical properties of t approach those of T. Unfortunately, however, in the case of DCEs, T usually is large, although many health economists make use of small, simplistic attribute sets based on vague and empirically unsupported notions of “overburdening respondents”. In any case, t must be fairly large to be reasonably sure that t T when T is large. Thus, Louviere, Hensher and Swait (2002) note that it rarely makes much sense not to use designs to INSURE satisfaction of the statistical properties that one desires/requires. Many health economists involved in SP research also seem unaware that utility functions estimated from DCEs can be both generic and alternative-specific. Generic utility specifications restrict parameter estimates to be the same for all choice options. Alternativespecific utility specifications allow parameters estimates to differ for at least one attribute of one choice option. Choice model utility specifications also can involve mixtures of generic and alternative-specific effects. Generally, however, there is little theory to guide choice of utility specifications. “Alternative-specificness” is the general case, whereas “genericness” is a restricted, special case (a testable restriction). Many health economics applications of DCEs have used small t ∈ T (eg, < 32), which seems to be associated with uncritical acceptance of conjoint analysis practices in marketing and transport research where small designs predominate. Use of minimal or at least “small” designs in marketing and transport appears to be an historical artifact inherited from the use of individual-level conjoint analysis methods to study preferences (see, eg, Louviere 1988). Unfortunately, designs that might be appropriate for obsolete conjoint analysis methods that focus on modeling individuals are typically inappropriate for modeling the choice behaviour of populations of consumers. Thus, use of small designs confuses individual-level “conjoint analysis” with DCEs, and perpetuates the successful marketing illusion that DCEs are simply “another type of conjoint analysis”. Indeed, it is widely believed that “modeling” individuals requires “smallish designs,” but in contrast to the equivalent of widely held “academic urban myths” in marketing and transport research, there is considerable evidence that humans will “do” dozens (even hundreds) of T’s (see, e.g., Louviere 2001, and Louviere, et al 2000). Small designs are appropriate if one wants precise estimates of choices for only THE T treatments in a certain design. However, this is rarely an appropriate design objective because researchers typically want to generalise to much large response surfaces spanning the range of possibilities in T faced by the population of consumers/choosers of interest. Indeed, one often can have one’s cake and eat it to by designing a common set, say Tc (small) plus a general set, say Tg (large) that are combined into a single experiment. That is, one assigns all Tc to each subject so that subjects can be compared with respect to a common subsets of sets/choices, and also assigns a sample/block drawn from Tg to each subject to enable one to estimate models that will generalize to as much of the response surface of interest as possible. 4.4 Modelling responses to DCEs Choice models that assume IID errors like MNL are almost certainly wrong because errors for each individual and/or observation cannot be not IID for reasons discussed later. Furthermore, “errors” are not unidimensional; hence, components of variance forms are more appropriate ways to view the situation. However, to our knowledge, only Cardell (1997) has proposed components of variance forms for choice models, which is surprising, to say the least. In any case, there can be many unobserved (random) effects, and the vast majority of them are unlikely to be associated only with preference heterogeneity, despite the widespread attention devoted to this one source of unobserved variability in the past decade to the virtual exclusion of other potential sources. For example, it is unclear why anyone would assume that individual consumers have IID random components/errors other than for mathematical convenience. Psychologists who conduct replicated experiments on the same subjects can testify to the fact that random components differ from individual-to-individual. Thus, IID errors are a “convenient” assumption, and if they are true, elegant theorems associated with recent complex statistical forms like mixed logit can be deduced that suggest that mixtures of logits can approximate any unknown choice process to an arbitrary degree of accuracy. On the other hand, if individuals’ errors are not IID, virtually all published choice model estimates would be biased, and the bias would likely be serious. More specifically, if individuals vary in their error variances, estimated distributions of preference parameters confound the true underlying preference parameter distribution with the distribution of error variances, and as these quantities are perfectly inversely related in RUT models, the resulting estimates will be confounded, regardless of how well models seem to fit or predict in crossvalidation tests. It also is unclear why one anyone would assume that utility specifications are the same for all individuals in a sample population. Again, this is a convenient assumption, but demonstrably untrue as shown by numerous published studies in the judgment and decisionmaking literature in psychology and related areas (and in papers published by the present author – see, e.g., Louviere 1988). Specifically, if individuals make decisions in different ways, such that the forms of their utility specifications differ, there will be omitted variables bias in the empirical estimates of choice models that do not take this into account, and the bias can be large, as Johnson, Hardie, Meyer and Walsh (2003) find in simulations. The consequences of these confounds are quite significant, as we now demonstrate. First consider the example in Table 6, which involves 10 subjects (0 to 9) in two different conditions. In case 1, all individuals have a constant scale and identical preferences for time and cost, but their model intercepts vary, leading to the utilities in column U1. In case 2 the individuals are exactly the same as in case 1, but their scales vary, leading to the utilities in column U2. That is, the values in the table for case 2 are obtained by multiplying the values in case 1 by the column labeled “Scale2” in case 2. What was clearly and unequivocally a homogeneous group of subjects with respect to their preferences for time and cost now seem to be very heterogeneous, and all that was required to produce this “change” was for scales or random component variances to differ across subjects. Thus, current discrete choice models CANNOT DISTINGUISH subjects who are heterogeneous in preferences from subjects who have identical preferences but different scales. Note further that the policy implications of this differences may be quite significant: 1) if subjects have homogeneous preferences, but different intercepts as in case 1, one can try to change predispositions by targeting intercepts and/or one can try to change/initiate new programs/policies by targeting preferences because all individuals make similar tradeoffs. In case 2, one can try the same strategies as in case 1, but one also can try to change individual variances. More generally, however, one is likely to be led to infer that tradeoffs DIFFER in case 2, when they do not the variances differ. Thus, different underlying processes lead to observationally equivalent outcomes (utilities and choice probabilities), but the processes that drive the outcomes are not the same. de Palma, Myers and Papageorgiou (1994) show that there are profound theoretical consequences if individuals differ in their ability to make decisions due to differences in the number of errors, so these issues have been the subject of theoretical study, but much more work is needed. Finally, but by no means exhaustively, even if all individuals use the same decision rules/utility specifications, utility values associated with attribute levels can differ across individuals in different ways. Such differences manifest themselves as utility scale unit measurement differences that vary from attribute to attribute, which is why one cannot do interdimensional utility comparisons without transforming to common units like WTP. Yet, utility scale units also can differ across individuals, which means that if one does not take the scale unit differences into account appropriately, there can be serious aggregation bias. I am unaware of any research that deals with the differences in individual utility scale units and their associated aggregation issues in the choice modeling literature. Table 6: Observational Equivalence ≠ Process Equivalence 1. Constant scale, varying intercepts 2. Same as 1, but scale varies Observational Equivalence Subject Intcept1 Time1 Cost1 Scale1 U1 Intcept2 Time2 Cost2 Scale2 U2 U1*Scale1 0 -1.00 -1.50 -1.00 1.00 1.50 -0.20 -0.30 -0.20 0.20 0.30 0.30 1 -0.75 -1.50 -1.00 1.00 1.75 -0.60 -1.20 -0.80 0.80 1.40 1.40 2 -0.50 -1.50 -1.00 1.00 2.00 -0.70 -2.10 -1.40 1.40 2.80 2.80 3 -0.25 -1.50 -1.00 1.00 2.25 -0.25 -1.50 -1.00 1.00 2.25 2.25 4 0.00 -1.50 -1.00 1.00 2.50 0.00 -3.00 -2.00 2.00 5.00 5.00 5 0.25 -1.50 -1.00 1.00 2.75 0.15 -0.90 -0.60 0.60 1.65 1.65 6 0.50 -1.50 -1.00 1.00 3.00 0.20 -0.60 -0.40 0.40 1.20 1.20 7 0.75 -1.50 -1.00 1.00 3.25 1.35 -2.70 -1.80 1.80 5.85 5.85 8 1.00 -1.50 -1.00 1.00 3.50 1.20 -1.80 -1.20 1.20 4.20 4.20 9 1.25 -1.50 -1.00 1.00 3.75 2.00 -2.40 -1.60 1.60 6.00 6.00 Let me close the discussion of the likelihood and consequences of there being individual differences in various aspects of decision processes by posing the general problem of combinatorics that underlies data used to make inferences in choice models. Suppose we design and administer an SP survey to a sample of 400 individuals in which we present every individual with the same eight scenarios, such that each scenario represents a health insurance policy described by levels of several attributes. Individuals simply respond yes or no to each of the eight descriptions to indicate if they would “consider it or not”. So, the response data are binary (yes/no) indicators. How many possible response patterns are associated with this simple experiment? The answer is 2, because each individual can say yes or no to each of the eight scenarios. There are 256 possible response patterns, but only 400 subjects, so one is unlikely to observe the entire distribution of possible responses even in this simple case. Suppose we double or even triple the sample size? We might observe all the possible patterns (emphasis on might), but we would have little confidence in our estimates of the likelihoods of their true occurrence in that population. Now suppose we expand the problem to 16 scenarios typical of many DCEs. The number of possible patterns now is more than 65,000! Suppose further that instead of a binary response task, the DCE involves four choice options in each of the 16 sets. Now the problem becomes a 4, leading to 4.3 billion possible patterns. If we could sample everyone on the planet we might have enough subjects to observe each pattern – if it occurs only once. Thus, those who advocate models that purport to “capture” distributions show tell the research community how it can possibly be that they have something meaningful to say based on the typical sample sizes found in academic and commercial research. What we can say is that one would have to be very lucky to have a very well-defined and compact underlying structure in the choice patterns in order to have a chance of getting reliable estimates of an underlying discrete or continuous distribution of preferences. Moreover, the vast majority of choice patterns are inconsistent with linear, additive utility functions (for 8 scenarios, only 6 of the 256 are consistent); hence, the probability of obtaining unbiased estimates from models that assume linear, additive utility specifications without interactions is low. More correctly, however, the number of patterns that one would EXPECT to observe in experiments depends on the preference directionality of the attributes. For example, if all attributes are monotonic and have the same expected sign a priori, such as bus fares, travel times and walking distances, all patterns that imply positive signs would not be expected. Yet, even with that reduction, the number of possible patterns will be very large. Moreover, imagine how many more possible patterns can arise if different individuals were exposed to different scenarios, which is far more typical of DCEs and real market choice data. Thus, it seems fair to say that the “rush to complexity” not only has been premature, but the statistical modeling theory far outstrips the ability of empirical data sets to support the models. So, while we should encourage and reward mathematical and statistical elegance and prowess, the field also should invest in and reward empirical research to redress the large imbalance that now exists. That is, we need much more behavioural theory and empirical insights to develop models that can deal with the combinatorics as well as capture ways that individuals make real decisions instead of ways that theorists and analysts imagine they make decisions. 4.5 Model selection problems are complex As the preceding discussion hopefully made obvious, the challenges in formulating appropriate models of underlying individual decision/choice processes are formidable. For example, let us view the general case as a “model selection” problem in which one wishes to identify the “correct” model from some range of possible candidate models. Consider a small example with three choice options (eg, A, B, neither), six attributes and three preference parameter (βk) distributions. The associated varcov(εi) matrix for options has 2 identifiable diagonal terms + 2 off-diagonal terms; the (βk) variance-covariance matrix has 6 identifiable diagonal terms + 15 off-diagonal terms (for main effects). Thus, there are a total of 6 main effects + 6 diagonal terms + 15 off-diagonal terms for (βk) + 2 diagonal + 2 off-diagonal terms for (εi). Each term can be significant/not significant, so there are 2 x 3 (distributions) possible models that can be estimated (6,291,456). And that’s for ONE data set, and excludes the possibility of interactions, which we already noted almost certainly must exist. Thus, it is fair to say that most published models actually are a sample of size one of possible models that could have been considered, and typically are estimated from ONE data set, which is a sample of size one of possible datasets that could be analysed. This again tells us that we need better theory and empirical evidence to guide research and minimize context dependence, while enhancing generalisability. “Better” theory will not come from statistics; it will come from careful consideration of fundamental, substantive behaviour(s) of interest and creative insights into individual and group behavioural processes. The preceding discussions also suggests that it will be hard to generalise complex models because one must parameterize moments of latent distributions to be functions of observable measures that can be forecast/observed in order to generalise. Unfortunately, few empirical choice modeling studies have demonstrated the ability of estimated models to generalise or “transfer” across time, place, samples, etc (although see Louviere, Hensher and Swait 2002, Chapter 8). Thus, much research in choice modeling is subject to the criticism that the results represent little more than a statistical description of the brief history of a few minutes of time in the lives of a small sample of people. In the case of panel data, one has a very narrowly focused history of the purchase behaviour of a few months in the lives of a small sample of people. 4.6 The more complex the choice model, the harder to generalise Ideally, one would like to pool cross-sectional, time series and other data to take advantage of the fact that some variables and processes are constant in some data sources. We can express the problem of generalizing more formally as follows: Y = f(X, Z, C, G, T), where Y = a vector of behavioral outcomes; X = a matrix of factors describing options/outcomes; Z = a matrix of individual/group factors; C = a matrix of contex/environmental factors; G = a matrix of geographical/spatial factors; and T = a matrix of time-varying factors. The degree to which one can generalize ultimately depends on one’s ability to specify “f” in the above expression to a reasonably close first approximation and/or the extent to which the components of each of the vectors are constant (or nearly so) across domains of interest. Indeed, if we expand the random component of utility to include a variety of “components of variance” (see, eg, Cardell 1997), we can easily see the issues: εi = εi(within) + εi(between) + εi(options) + εi(attributes) + εi(contexts) + εi(geography) + ... + εi(time) + ... If one views random components in this general form, it now becomes obvious that it is harder to generalise if many components are constant. Moreover, it also will be harder to generalise more complex models because one needs to forecast “extra” latent parameters and/or account for differences in datasets to the extent that they are manifest in parameter and/or variance differences. Thus, choice models purporting to estimate segments, parameter distributions, etc, should be regarded as context dependent unless researchers provide proof to the contrary, and proof of ability to generalise should be made a publication requirement in health economics and related areas. Health economists should NOT emulate marketing academics and practitioners who rely on hold-out samples of choice sets drawn from the same distribution of choice sets used to estimate models, and use “hit-rates” or “correct choice classifications” as well as other dubious quasi-statistical measures to assess models. Instead, wherever possible, health economists should test models against real market behaviours and/or validity tests should be designed to maximize the chance that the models will fail in certain ways to allow discrimination among models rather than the practice in marketing that amounts to little more than insuring “adequate” performance outcomes. Progress towards resolving these issues mooted above is likely to require significant changes in modeling strategy and the development of behavioural theory and/or new methods that allow one to discriminate among underlying processes. Three potential ways forward are suggested later in the paper, but there undoubtedly are others. What matters more generally is for the research community to recognise and admit that serious issues remain unresolved, and the issues are consequential. Recognition and admission of problems are first steps towards solving them. In the next section we review evidence that random component variances vary systematically in response to a variety of factors, which further supports the notion that current models inadequately capture the underlying processes, which in turn suggests that new thinking, new theory and new methods are urgently needed. 5.0 Empirical evidence that demonstrates that error components vary systematically 5.1 Brazell, Gray-Lee and Louviere (1998) BG-LL investigated the effects of survey length and scenario order on systematic and random components. They varied numbers of choice sets in two studies. In the first study they randomly assigned respondents to 12, 24, 48 and 96 sets of pairs of canned soup descriptions. In the second study they randomly assigned respondents to 16, 32, 64 or 128 sets of pairs of add on local tours at Mexican holiday resorts. In both studies they also systematically varied the order of pairs in the first set of 12 (study 1) or 16 pairs (study 2) by using a latin square design to systematically vary the order in which pairs appeared. They found that the parameters of the systematic utility components were equal for all numbers of choice sets (pairs) and set orders once unobserved variability taken into account. There also was no systematic relationship between number of sets or order and levels of unobserved variability, although significant differences in levels of unobserved variability were found. 5.2 Bunch, Brydon, Brazell and Louviere (1998) BBBL studied the effects of survey medium, number of choice options and number of attributes on systematic component model parameters and unobserved variability. That is, they designed a series of experiments in which they systematically varied survey medium (paper/pencil, PC survey), number of options (3, 6) and number of attributes (6, 12). The PC survey tasks were constructed in such a way that subjects had to scroll down to see all the information, which was designed to simulate a web-based survey. As well, in the PC survey case, subjects could not change their earlier answers or look ahead. BBBL found that random component variability differed only for the condition in which there were six options and 12 attributes; however, the variability in this condition was about three times that associated with other conditions. BBBL also found significant differences in model intercepts for Paper and pencil compared with PC conditions. It is unclear why this would be the case given that the subjects were randomly assigned to the experimental conditions. 5.3 Yeoh, Uldry, Louviere and Burke (1998) YULB examined the effects of varying presence/absence of attribute information on the parameters of models of systematic components and unobserved variability. That is, they designed an experiment to systematically vary branded/generic options, number of attributes (3, 6) and number of choice options/type of response (yes/no, 3 choices). YULB found that systematic component model parameters were the same once they controlled for unobserved variability differences. The primary impacts were on unobserved variability and model intercepts, such that more choices and more attributes increased unobserved variability, and most of the interactions between experimental factors were significant in explaining differences in unobserved variability across experimental conditions. They also found that model intercepts systematically varied as a function of the experimental conditions. 5.4 Swait and Adamowicz (2001) SA hypothesised that the nature of choice sets and associated task complexity impact random component variability. SA developed an entropy-based definition of complexity, and tested whether their designed-derived measures of task complexity were associated with unobserved variability in several different sources of choice data. They found that task complexity impacted parameter inferences and systematic component parameter estimates. They also showed that unobserved variability and task complexity are inverse-U related, such that unobserved variability increases up to a point then decreases. The latter finding suggests that there may bean optimal level of design complexity, which is consistent with the next research result. 5.5 Severin (2001) Severin took advantage of recently derived result concerning the statistical efficiency of choice experiments to investigate the degree to which optimally efficient experiments increase task complexity, and in turn decrease choice consistency (unobserved variability), which impacts overall respondent efficiency. Severin showed that as the number of attributes increase, the overall statistical efficiency of optimal designs declines due to less respondent efficiency. Specifically, she designed experiments to investigate this hypothesis by varying numbers of attributes (6, 16), numbers of attribute level differences (number of differences is nested under numbers of attributes) for two product categories (delivered pizzas and holiday island packages). She used theoretical results available in Street, Bunch and Moore (1999) to design optimally efficient pairs of choice options; subjects were randomly assigned to the various experimental conditions. She showed that overall respondent efficiency declines as attribute level differences increase, although for six attributes, there was no appreciable loss of respondent efficiency. For 16 attributes, unobserved variability increases sharply, and the use of optimal designs cannot offset the loss in overall respondent efficiency. Her results suggest that the number of attribute differences drives unobserved variability, not the number of attributes per se. 5.6 DeShazo and Fermo (2001) DF investigated ways in which the design of SP choice sets impact unobserved variability. DS defined five measures of choice set complexity, of which two were broad measures – the amount of choice set information and the correlational structure of the information in the choice sets. DS found that their complexity measures were significantly associated with random component variability, albeit to different extents and with different levels of significance. 5.7 DeShazo and Fermo (2003) DF investigated how task complexity impacts respondents use of information. That is, unless individuals use full information, WTP results obtained from traditional DCEs will be biased. They show that subjects selectively use information as choice set complexity increases, such that different subjects use different attributes. 5.8 A potpourri of additional results Dellaert (1996) showed that response variability differs for components of travel packages, such as transportation, accommodation, food. Dellaert, Brazell & Louviere (1998) showed that response variability differs systematically with differences in absolute prices and price differences for package tours. Hutchinson, Kamakura and Lynch (2000) showed that differences in unobserved heterogeneity (a specific component of unobserved variability) can produce a variety of outcomes reported in the judgment and decision making/behavioural decision making literature, including preference reversals. They concluded that if one did not account for preference heterogeneity in preference and choice experiments, one could be led to incorrect conclusions about aggregate choice processes. Finally, de Palma, Myers and Papageorgiou (1994) developed theory to demonstrate that if consumers with less ability to make choices make more errors comparing marginal utilities, this results in large differences in population choices, and these differences imply different choice strategies, even if all individuals use the same choice strategy! Thus, the foregoing review of some of the available theoretical and empirical results related to unobserved variability strongly suggests that differences in context, complexity, etc, of decision tasks can impact various subcomponents of response variability, such as unobserved heterogeneity, and failure to take this into account can (and does) result in spurious aggregate behavioral effects. Thus, health economists who undertake choice modeling studies, whether based on experimental or real market data, need to be aware of the possibility of serious bias if one fails to take unobserved variability into account. Similarly, differences in unobserved variability across individuals and groups poses significant issues for the interpretation and application of discrete choice models, and would be well-advised to exercise caution in interpreting the results of complex choice model specification that impose and estimate latent quantities, such as distributions of preference parameters. One should be particularly careful of interpreting and applying WTP and related measures derived from such models due to the high likelihood of potential confounds in the underlying processes that cannot be separated with current modeling technology/ 6.0 Quo Vadis? It is important to ask where we are going, or more appropriately how can we go forward to resolve some of the issues discussed in this paper? To address this question, we briefly discuss three general ways forward that have the potential to resolve some of the issues and problems previously described and discussed. These three ways forward are: 1. Develop a way to estimate choice models for single individuals. 2. Develop a way to specify and estimate models in latent WTP space instead of latent utility space. 3. Develop “model-free” ways to explore choice data to obtain insights into underlying processes and ways to a) properly aggregate individuals who choose alike, b) specify more realistic models of individual and aggregate choice processes, and/or 3) both. 6.1 Develop a way to estimate choice models for single individuals. Currently a team of scholars consisting of UTS academics (Louviere, Street and Burgess) and international collaborators (Anderson and Marley) are working on the problem of how to design SP elicitation procedures that can lead to individual-level choice models. The project is funded by an Australian Research Council Discovery grant. Individual-level models are the “holy grail” in psychology and other fields, and many scholars have invested many years of effort, with relatively little progress to show for it. The problem is inherently difficult because discrete choices contain little statistical information, suggesting that one potentially needs many observations per person to estimate reliable and valid models. The UTS team is taking a cross-disciplinary approach that involves a) optimally efficient, small designs for choice experiments, combined with b) new ways to maximize the statistical information obtained from each choice set. So far the UTS team has been able to estimate models for individuals for choice problems of modest size, such as four choice options, six attributes and 16 choice sets. They are preparing proofs of the measurement and behavioural properties of certain new classes of models that provide more choice information from subjects. Papers dealing with the theory currently are being prepared. 6.2 Develop a way to specify and estimate models in latent WTP space. This approach was suggested by Louviere, et al (2000) as a way to deal with the random component variance confounds previously discussed. A group of scholars at UCLA (Ainslie, Sonnier and Moorthy 2003) are working on this problem, and report progress using Hierarchical Bayes procedures to estimate models of the following form: Ui = Xiβ1 + β2 ln(P)i + εi, or Ui = (Xiγ – ln(P)i) / μ + εi β2 = 1/μ; βi = γ / μ They obtained interesting empirical results valuing brands using scanner panel data from US supermarkets. Their current approach does not appear to take the uncertainty in choices into account, but their results are encouraging. 6.3 Develop “model-free” ways to find patterns in choices, aggregate individuals with similar patterns and study their choice processes. Current choice modeling approaches impose models on choice data, and estimate the parameters of the maintained model. This modus operandi precludes obtaining insights into choice processes and/or utility specifications (decision rules) other than those that are maintained/imposed. It also raises many aggregation issues, as well as questions about bias in estimates and derived policy measures, as previously discussed. If all individuals face a common set of choices, such as a common SP design, they potentially can be compared. As previously noted, if each individual can choose among J choice options, and there are C total choice sets common to all individuals in a sample population, there are J possible patterns of choices. As J or C increase the number of possible patterns quickly grows large; hence, the reliability and validity of choice model estimates rely on the existence of significant structure in the underlying choice patterns. Several research groups are working on ways to logically aggregate individuals who make similar choices into groups/segments (eg, a team of UTS and UCSD researchers Louviere and Carson). One can identify individuals with similar choice patterns, and group them. Once grouped, one can estimate separate choice models for each group that lead to systematic and significant reductions in overall choice model Log Likelihoods. A number of issues and problems remain unresolved, such as identifying a theoretically acceptable way to aggregate individuals who are “similar” but not exactly the same in their patterns of choices. If theoretically acceptable ways can be found to solve these problems, it should lead to identification of groups of individuals with similar systematic and random components, which can potentially resolve the confounding issues previously discussed. 7.0 Discussion and Conclusions We discussed issues related to combining data from a variety of sources to enhance model identification and statistical efficiency. This review covered combining different sources of preference information, measuring and using variables to reduce omitted variables bias and combining real market and stated preference data. We also reviewed a number of issues related to applications of RUT-based choice models in health economics. We noted that there were several serious, unresolved issues, such as likely identification issues and confounds among estimated model parameters and parameter distributions and error variances, incommensurability of individual utility measurement units and appropriate ways to aggregate individual utility functions. The review and associated discussion suggested several potential ways forward. Our review of issues associated with the use of choice models suggests that there are a number of serious unresolved issues that can impact the interpretation and use of discrete choice models in health economics. That begs the question of whether health economists should continue to use choice models until the issues are resolved, and the answer to this question is that one always relies on and makes do with best practice, and choice models are best practice for addressing a wide array of problems in health economics. For example, despite confounds, choice models estimated from real market and stated preference data typically will predict choice outcomes accurately, which includes likely responses to policy changes, new programs, modifications to existing programs, and the like. Caution is advised in interpreting model estimates as true parameters, or distributions of model estimates as estimates of true parameter distributions. In these cases, the estimates obtained by analysts are likely to be biased, representing unknown and complex combinations of distributions of parameters and random component variances or scales as well as other sources of omitted variables, such as omitted interaction effects. Thus, one needs to be cautious about overconcluding from one’s results and/or making strong policy statements. Similarly, one might wish to confine policy analyses to observed and/or predicted choice probabilities, instead of estimated utilities. The reason for this suggestion is that even if model estimates are biased, the models should predict choice probabilities accurately within the domain of application. Thus, one can infer WTP or other policy measures from changes in choice probabilities instead of changes in utilities. We concluded our review by suggesting three potential ways forward to resolve theissues discussed. The three ways included 1) developing theoretically sound and empiricallyapplicable ways to model the choice of single individuals, 2) developing “model-free”methods by which to explore patterns in choice data that are common to samples ofindividuals, and using these methods to aggregate individuals into relatively homogenousgroups of choosers, and 3) developing theoretically sound and applicable models in latentWTP or other commonly commensurable latent domains that will allow individual preferenceto be modeled and aggregated in a theoretically acceptable manner.We are optimistic that more realistic and theoretically appealing models soon will beavailable to health economists and others interested in choice behaviour. ReferencesAdamowicz, W, Louviere, J.J. and M. Williams. (1993) “Combining Revealed and StatedPreference Methods for Valuing Environmental Amenities," Journal of EnvironmentalEconomics and Management, 26, 271-292.Ainslie, A., Sonnier, G. and S. Moorthy (2003) “Who Values Private Labels?” UnpublishedWorking Paper, Department of Marketing, Anderson School of Business, UCLA.Arrow, K. J. (1963) Social Choice and Individual Values, New York: WileyArrow, K., Solow, R., Portney, P, Leaner, E., Radner, R. amd H. Schuman (1993) “Report ofthe NOAA Panel on Contingent Valuation,” Federal Register, 58, 4601-4614.Brazell, J., Gray-Lee, J. and J.J. Louviere (1998). “Revisiting Order Effects: Are They JustDifferences in Random Variation?” Paper presented to the Marketing ScienceConference, Fontainebleau, France, July.Ben-Akiva, M. and T. Morikawa (1990). “Estimation of Switching Models from RevealedPreferences and Stated Intentions,” Transportation Research A, 24A, 485-495. Ben-Akiva, M., Bradley, M., Morikawa, T., Benjamin, J., Novak, T., Oppewal, H. and Rao,V. (1994). "Combining Revealed and Stated Preferences Data," Marketing Letters, 5,323-334.Bishop, Y.M.M., Feinberg, S.E. and P.W. Holland (1975) Discrete Multivariate Analysis:Theory and Practice, Cambridge, MA: MIT Press.Blamey, R., Bennett, J. and J.J. Louviere (2001) “Green Product Choice.” Chapter 6 inBennett, J. and R. Blamey (Eds.) The Choice Modelling Approach to EnvironmentalValuation, Cheltenham, UK: Edward Elgar, pp. 115-132.Bucklin, R.E. and C. Sismeiro (2003) “A Model of Web Site Browsing Behavior Estimatedon Clickstream Data,” Journal of Marketing Research, 40, 249-267.Bunch, D.S., Brydon, K., Brazell, J, and J.J. Louviere (1998). “Comparing the Effects ofSurvey Medium, Numbers of Choice Options and Numbers of Attributes on Choices:Maybe It’s Just Differences In Error Variability?” Paper presented to the INFORMSMarketing Science Conference, Fontainbleau (INSEAD), France, July.Burgess, L. and D. J. Street (2003) “Optimal Designs for 2 Choice Experiments,”Communications in Statistics – Theory and Methods, forthcoming.Cameron, T.A., Poe, G.L., Etheir, R.G. and W.D. Schulze (forthcoming). “AlternativeNonmarket Value-Elicitation Methods: Are the Underlying Preferences the Same?,”Journal of Environmental Economics and Management.Cardell, N.S. (1997) “Variance Components Structures for the Extreme Value and LogisticDistributions with Applications to Models of Heterogeneity,” Econometric Theory, 13,2, 185-213. Carson, R., Anderson, D., Arabie, P., Bunch, D., Hensher, D., Johnson, R., Kuhfeld, W.,Louviere, J., Steinberg, D., Swait, J., Timmermans, H., and J. Wiley (1993)“Experimental Analysis Of Choice," Marketing Letters, 5, 351-368.Dellaert, Benedict, Jeff D. Brazell and Jordan J. Louviere (1999). “The Effect of AttributeVariation on Consumer Choice Consistency,” Marketing Letters, 10, 2, 139-147.DeSarbo, W.S. and J. Choi (1999) “A Latent Structure Double-Hurdle Regression Model forExploring Heterogeneous Consumer Search Patterns,” Journal of Econometrics, 89,423-455.DeShazo, J.R. and G. Fermo (2001) “Rational Choice with Cognitive Limitations: EmpiricalSupport for Partial-information Processing Strategies.” Unpublished Working Paper,School of Public Policy and Social Research, UCLA.DeShazo, J.R. and G. Fermo (2002) “Designing Choice Sets for Stated Preference Methods:The Effects of Complexity on Choice Consistency.” Journal of EnvironmentalEconomics and Management 44, 123-143. Erdem, Tulin. and Keane, Michael P (1996), “Decision-making under uncertainty: Capturingdynamic brand choice processes in turbulent consumer goods markets”. MarketingScience. 15(1): 1-20.Finn, A. and J.J. Louviere (1992) "Determining the Appropriate Response to Evidence ofPublic Concern: The Case of Food Safety," Journal of Public Policy and Marketing,11, 1, 12-25.Hensher, D.A., Louviere, J.J. and J. Swait (1999). “Combining Sources of Preference Data,”Journal of Econometrics, 89, 197-221.Hutchinson, J.W., Kamakura, W.A. and J.G. Lynch, Jr. (2000). “Unobserved Heterogeneityas an Alternative Explanation for “Reversal” Effects in Behavioral Research,” Journalof Consumer Research, 27, 324-344.Johnson, Eric J., Hardie, Bruce G. S. Meyer, Robert J. and J. Walsh (2003) “ObservingUnobserved Heterogeneity: Using Process Data to Enhance Choice Models,”Unpublished Working Paper, Department of Marketing, Wharton School of Business,University of Pennsylvania, Philadelphia, September.Louviere, J.J. (1988) Analyzing Decision Making: Metric Conjoint Analysis. Sage UniversityPapers Series Number 67. Newbury Park, CA: Sage Publications, Inc.Louviere, J.J. (2001) “What If Consumer Experiments Impact Variances As Well AsMeans: Response Variability As A Behavioral Phenomenon,” Journal ofConsumer Research, 28, 3, 506-511.Louviere, J. and Woodworth, G. (1983). “Design and Analysis of SimulatedConsumer Choice or Allocation Experiments: An Approach Based onAggregate Data,” Journal of Marketing Research, 20, 350-367. Louviere, J.J., Meyer, R.J., Bunch, D.S., Carson, R.T., Dellaert, B., Hanemann, M., Hensher,D.A. and J. Irwin (1999). “Combining Sources of Preference Data for ModellingComplex Decision Processes,” Marketing Letters, 10, 3, 187-204.Louviere, J.J. Street, D., Carson, R., Ainslie, A., DeShazo, J.R. Cameron, T., Hensher, D.,Kohn, R. and T. Marley (2002) “Dissecting the Random component of Utility,”Marketing Letters, 13, 3, 177-193.Louviere, J., Fox, M. and Moore, W. (1993). “Cross-Task Validity Comparisons of StatedPreference Choice Models” Marketing Letters, 4, 3, 205-213.Louviere, J.J., Hensher, D.A. and J.D. Swait (2002). Stated Choice Methods: Analysis andApplication, Cambridge, UK: Cambridge University Press, Second Printing.Louviere, J.J., Street, D.J. and L. Burgess (2003) “A 20+ Years Retrospective on ChoiceExperiments” In Marketing Research and Modeling: Progress and Prospects, Chapter8, Yoram Wind and Paul E. Green (Eds), New York, Kluwer Academic Publishers.Luce, R.D. and P. Suppes (1965) “Preference, Utility and subjective Probability,” In Luce,R.D., Bush, R.R., and E. Galanter (Eds) Handbook of Mathematical Psychology, III,249-410.McClelland, G.H. and C.M. Judd (1993). “The Statistical Difficulties of DetectingInteractions and Moderator effects,” Psychological Bulletin, 144, 2, 376-390.McFadden, D. (2001). “Disaggregate Behavioural Travel Demand’s RUM Side: A 30-YearRetrospective,” In D.A. Hensher, ed., Travel Behavioural Research: The Leading Edge,Amsterdam: Pergamon, 17-64.McFadden, D. and K. Train (2000). "Mixed MNL Models for Discrete Response," Journal ofApplied Econometrics, 15, 447-470. Mitchell, R.C. and R. T. Carson (1989) Using Surveys to Value Public Goods, Washington,D.C.: Resources for the Future.Ohler, T., Le, A., Louviere, J.J. and J.D. Swait (2000). “Attribute Range Effects in BinaryResponse Tasks,” Marketing Letters, 11, 3, 249-260.Revelt, D. & K. Train (1998). "Mixed Logit with Repeated Choices: Households' Choices ofAppliance Efficiency Level," Review of Economics and Statistics, 80, 1-11.Severin, V. (2001). “Comparing Statistical and Respondent Efficiency In ChoiceExperiments,” Unpublished Ph.D. Dissertation, Discipline of Marketing, Faculty ofEconomics and Business, The University of Sydney (August).Severin, V., Louviere, J.J. and A. Finn (2000). “The Stability of Retail Shopping ChoicesOver Time and Across Countries,” Journal of Retailing, 77, 185-201. Street, D.J., D.S. Bunch, and B. Moore (2001). “Optimal Designs for 2 Paired ComparisonExperiments,” Communications in Statistics Theory and Methods, forthcoming.Street, D.J. and L. Burgess (2003) “Optimal and Near-optimal Pairs for the Estimation ofEffects in 2-level Choice Experiments,” Journal of Statistical Planning and Inference,forthcoming.Swait, J. D. and J. J. Louviere (1993). "The Role of the Scale Parameter in the Estimation andComparison of Multinomial Logit Models," Journal of Marketing Research, 30, 305-314.Swait, J., Adamowicz, W. (2001a). "Choice Complexity and Decision Strategy Selection",Journal of Consumer Research, 28, 135-148.Swait, J., Adamowicz, W. (2001b). "Choice Environment, Market Complexity, andConsumer Behavior: A Theoretical and Empirical Approach for Incorporating DecisionComplexity into Models of Consumer Choice", Organizational Behavior and HumanDecision Processes, 86, 2, 141-167.Swait, J., Louviere, J. and Williams, M. (1994). “A Sequential Approach to Exploiting theCombined Strengths of SP and RP Data: Application to Freight Shipper Choice,”Transportation, 21, 2, 135-152.Train, K. (1997). Mixed Logit Models for Recreation Demand, in Kling, C. and Herriges, J.(eds.) Valuing the Environment Using Recreation Demand Models, Brookfield, VT:Edward Elgar.Wedel, W., Kamakura, W., Arora, N., Bemmaor, A. Chiang, J. Elrod, T., Johnson, R., Lenk,P., Neslin, S., and Poulsen, C.S. (1999). “Heterogeneity and Bayesian Methods inChoice Modelling” Marketing Letters, 10, 219-232.Yeoh, A., Uldry, P., Louviere, J.J. and Burke, S. (1998). “Does Missing Information AffectMeans or Variances of Error Distributions?” Paper presented to the INFORMSMarketing Science Conference, INSEAD, Fontainbleau, France, June.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining sources of preference data

This paper brings together several research streams and concepts that have been evolving in random utility choice theory: (1) it reviews the literature on stated preference (SP) elicitation methods and introduces the concept of testing data generation process invariance across SP and revealed preference (RP) choice data sources; (2) it describes the evolution of discrete choice models within th...

متن کامل

Discrete Choice Experiments Are Not Conjoint Analysis

We briefly review and discuss traditional conjoint analysis (CA) and discrete choice experiments (DCEs), widely used stated preference elicitation methods in several disciplines. We pay particular attention to the origins and basis of CA, and show that it is generally inconsistent with economic demand theory, and is subject to several logical inconsistencies that make it unsuitable for use in a...

متن کامل

Preference Anomalies, Preference Elicitation and the Discovered Preference Hypothesis

There is wide-ranging evidence, much of it deriving from economics experiments, of ‘anomalies’ in behaviour that challenge standard preference theories. This paper explores the implications of these anomalies for preference elicitation methods. Because methods that are used to inform public policy, such as contingent valuation, are based on standard preference theories, their validity may be ca...

متن کامل

Modelling preference heterogeneity in stated choice data: an analysis for public goods generated by agriculture

Stated choice models based on the random utility framework are becoming increasingly popular in the applied economics literature. The need to account for respondents’ preference heterogeneity in such models has motivated researchers in agricultural, environmental, health and transport economics to apply random parameter logit and latent class models. In most of the published literature these mo...

متن کامل

A nonparametric elicitation of the equity-efficiency trade-off in cost-utility analysis.

We performed an empirical elicitation of the equity-efficiency trade-off in cost-utility analysis using the rank-dependent quality-adjusted life-year (QALY) model, a model that includes as special cases many of the social welfare functions that have been proposed in the literature. Our elicitation method corrects for utility curvature and, therefore, our estimated equity weights are not affecte...

متن کامل

O-31: Balancing Selected Medication Costs with Total Number of Daily Injections: A Preference Analysis of GnRH-Agonist and AntagonistProtocols by IVF Patients

Background: During in vitro fertilization (IVF), fertility patients are expected to self-administer many injections as part of this treatment. While newer medications have been developed to substantially reduce the number of these injections, such agents are typically much more expensive. Considering these differences in both cost and number of injections, this study compared patient preference...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004